8353230: Emoji rendering regression after JDK-8208377 #24412

gredler · 2025-04-03T11:23:42Z

It looks like this regression actually fits into a longer series of fixes / regressions in this area:

JDK-4517298 fixed metrics for zero-width characters, but broke some ligatures / glyph substitutions
JDK-7017058 fixed the ligatures / glyph substitutions, but broke some zero-width metrics
JDK-8208377 fixed some metrics and rendering for zero-width characters, but broke some ligatures / glyph substitutions
Now, with this PR, we aim to fix the ligatures without re-breaking zero-width metrics and display

We have two different types of use cases pulling CharToGlyphMapper in two different directions: the users who need raw, untransformed glyph info, and the users who need normalized / transformed glyph info.

It looks to me like, in the current code base, the only CharToGlyphMapper user which requires raw font data is HarfBuzz (explicitly confirmed with the HarfBuzz team here: harfbuzz/harfbuzz#5234).

The regression mechanism at play here is that the HarfBuzz font callbacks are currently providing HarfBuzz with transformed glyph info (e.g. ZWJ -> INVISIBLE_GLYPH_ID), which prevents HarfBuzz from recognizing and applying the correct font GSUB substitutions (which involve ZWJ).

In order to fix this without (yet again) breaking metrics and display behavior elsewhere, I've added two methods to CharToGlyphMapper which provide access to raw glyph info, to be used by the HarfBuzz font callbacks: charToGlyphRaw(int) and charToVariationGlyphRaw(int).

Note two intricacies related to CompositeGlyphMapper:

We need to be careful to only cache raw (untransformed) values, to avoid conflicts between requests for a raw version of a glyph and a transformed version of the same glyph. Another option would have been two separate caches, but I don't think that's necessary.
Consumers who are using CompositeGlyphMapper.SLOTMASK to check glyph slots (e.g. FontRunIterator and CTextPipe) will "see" invisible glyphs as having come from slot 0. This isn't new, and I think it's OK, but something to be aware of.

The glyph cache handling in CCharToGlyphMapper (for macOS) also requires care to avoid mixing value types.

Please also note that I'm not sure if the tweak to sunFont.c is being tested, since FFM is being used by default for HarfBuzz integration. (Is there a plan to remove the JNI version soon?)

This PR includes a self-contained regression test. It includes a small font created just for this test, which exercises the ligature / glyph substitution infrastructure. The font tests, including the new regression test, all pass locally on Linux, Windows and macOS (make test TEST="jtreg:test/jdk/java/awt/font").

Interestingly, the changes for JDK-7017058 (mentioned above) included a test (ZWJLigatureTest) which I think would have caught this last regression, but it depends on optional Windows fonts which I guess do not exist on any commonly-used test infrastructure. This should not be an issue with the new test, since it does not depend on any external fonts.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8353230: Emoji rendering regression after JDK-8208377 (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24412/head:pull/24412
$ git checkout pull/24412

Update a local copy of the PR:
$ git checkout pull/24412
$ git pull https://git.openjdk.org/jdk.git pull/24412/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24412

View PR using the GUI difftool:
$ git pr show -t 24412

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24412.diff

Using Webrev

Link to Webrev Comment

…h info

bridgekeeper · 2025-04-03T11:24:16Z

👋 Welcome back dgredler! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-04-03T11:24:31Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-04-03T11:25:38Z

@gredler The following label will be automatically applied to this pull request:

client

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-04-03T11:28:22Z

Webrevs

00: Full (3fdc7901)

YaaZ · 2025-04-15T17:28:12Z

We had similar emoji-related regressions at JetBrains. Although our font-related code diverged from OpenJDK a bit, porting this patch seems to resolve them too. I am not an OpenJDK reviewer, but LGTM nevertheless.

gredler · 2025-04-21T17:43:19Z

@YaaZ Thanks for the information!

@prrace Have you had a chance to look at this PR?

prrace · 2025-04-24T03:04:36Z

@YaaZ Thanks for the information!

@prrace Have you had a chance to look at this PR?

It passed all the testing I did. I still need to look hard at the changes.

YaaZ · 2025-04-29T09:55:56Z

By the way, I see that in each implementation, both charToGlyph and charToGlyphRaw call a common method, like getGlyph(int uniciode, boolean raw). At first there was just charToGlyph, then charToVariationGlyph was added and now you added a "raw" version for each of them, I see that in the future we will need other variants and how it's already starting an exponential explosion. Overriding all of those methods in each implementation brings quite a bit of boilerplate, and it becomes easier to miss something. Maybe take a step back and refactor this into a single charToGlyph(int unicode, int variationSelector, boolean raw) version?
Also, this raw parameter only really controls isDefaultIgnorable check in the end of each method. Maybe we could factor this out without bringing it separately into each mapper implementation?

gredler · 2025-05-01T20:02:45Z

@YaaZ: Thanks for the additional feedback, please see my thoughts below:

By the way, I see that in each implementation, both charToGlyph and charToGlyphRaw call a common method, like getGlyph(int uniciode, boolean raw). At first there was just charToGlyph, then charToVariationGlyph was added and now you added a "raw" version for each of them, I see that in the future we will need other variants and how it's already starting an exponential explosion.

I don't know if I would call two changes to CharToGlyphMapper in 20 years an exponential explosion, but I get your point :-)

Overriding all of those methods in each implementation brings quite a bit of boilerplate, and it becomes easier to miss something.

True, but again keep in mind that there are only 5 implementations, only one of which (the macOS CCharToGlyphMapper) has been added in the last 20 years.

Maybe take a step back and refactor this into a single charToGlyph(int unicode, int variationSelector, boolean raw) version?

We'd still need separate methods for int vs. char, but I think this might reduce 5 methods down to 3? The changeset would be a bit more intrusive (lots of callers would need to change to reflect the new method signature). I'd be interested to hear thoughts from some of the reviewers on this one.

Also, this raw parameter only really controls isDefaultIgnorable check in the end of each method. Maybe we could factor this out without bringing it separately into each mapper implementation?

I prefer to think of it as controlling whether or not any transformations to INVISIBLE_GLYPH_ID happen (right now it's just for default-ignorable characters, but there may be other scenarios in the future, e.g. \r, \n and \t which currently are handled elsewhere).

Any ideas for what this refactoring might look like?

YaaZ · 2025-05-01T21:55:00Z

I was talking about the explosion because there is a scenario in my mind, which I didn't make clear for everybody else. There is a change which I didn't have time to contribute, but would like to: it's related to composite fonts and variation selectors. We may need 2 variants for retrieving a glyph with a variation selector - one strictly matching a variation selector and another with a fallback to the base glyph, multiplied by raw/transformed versions, which adds 2 more methods. Not like it's a big problem, but given that they all end up calling a single method anyway... You get the point.

there may be other scenarios in the future, e.g. \r, \n and \t which currently are handled elsewhere).

Are those scenarios specific to a patricular mapper/font type? I was thinking that those transformations are generic.

Any ideas for what this refactoring might look like?

I was thinking about moving this default-ignorable or any potential generic transformation into base CharToGlyphMapper or even Font2D. For example, make default implementation of CharToGlyphMapper.charToGlyph check ignorable characters and then call charToGlyphRaw - then other implementations would only need to override charToGlyphRaw.

gredler added 2 commits April 2, 2025 22:38

Differentiate CharToGlyphMapper users who want raw vs normalized glyp…

a8e5905

…h info

Finish macOS implementation

Loading
Loading status checks…

3fdc790

openjdk bot added the rfr label Apr 3, 2025

openjdk bot added the client label Apr 3, 2025

gredler mentioned this pull request May 6, 2025

8350203: [macos] Newlines and tabs are not ignored when drawing text to a Graphics2D object #23665

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8353230: Emoji rendering regression after JDK-8208377 #24412

8353230: Emoji rendering regression after JDK-8208377 #24412

gredler commented Apr 3, 2025 •

edited by openjdk bot

Loading

bridgekeeper bot commented Apr 3, 2025

openjdk bot commented Apr 3, 2025

openjdk bot commented Apr 3, 2025

mlbridge bot commented Apr 3, 2025

YaaZ commented Apr 15, 2025

gredler commented Apr 21, 2025

prrace commented Apr 24, 2025

YaaZ commented Apr 29, 2025

gredler commented May 1, 2025

YaaZ commented May 1, 2025

8353230: Emoji rendering regression after JDK-8208377 #24412

Are you sure you want to change the base?

8353230: Emoji rendering regression after JDK-8208377 #24412

Conversation

gredler commented Apr 3, 2025 • edited by openjdk bot Loading

Progress

Issue

Reviewing

bridgekeeper bot commented Apr 3, 2025

openjdk bot commented Apr 3, 2025

openjdk bot commented Apr 3, 2025

mlbridge bot commented Apr 3, 2025

Webrevs

YaaZ commented Apr 15, 2025

gredler commented Apr 21, 2025

prrace commented Apr 24, 2025

YaaZ commented Apr 29, 2025

gredler commented May 1, 2025

YaaZ commented May 1, 2025

gredler commented Apr 3, 2025 •

edited by openjdk bot

Loading